Big Batch SGD: Automated Inference using Adaptive Batch Sizes

نویسندگان

  • Soham De
  • Abhay Kumar Yadav
  • David W. Jacobs
  • Tom Goldstein
چکیده

Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it difficult to use them for adaptive stepsize selection and automatic stopping. We propose alternative “big batch” SGD schemes that adaptively grow the batch size over time to maintain a nearly constant signal-to-noise ratio in the gradient approximation. The resulting methods have similar convergence rates to classical SGD methods without requiring convexity of the objective function. The high fidelity gradients enable automated learning rate selection and do not require stepsize decay. For this reason, big batch methods are easily automated and can run with little or no user oversight.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Inference with Adaptive Batches

Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it di cult to use them for adaptive stepsize selection and automatic stopping. We propose alternative “big batch” SGD schemes that adaptively grow the batch size ove...

متن کامل

The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning

Stochastic Gradient Descent (SGD) with small mini-batch is a key component in modern large-scale machine learning. However, its efficiency has not been easy to analyze as most theoretical results require adaptive rates and show convergence rates far slower than that for gradient descent, making computational comparisons difficult. In this paper we aim to clarify the issue of fast SGD convergenc...

متن کامل

AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer more parallelism and hence better computational efficiency. We have developed a new training approach that, rather than statically choosing a single batch siz...

متن کامل

Rectified linear neural networks with tied-scalar regularization for LVCSR

It is known that rectified linear deep neural networks (RL-DNNs) can consistently outperform the conventional pretrained sigmoid DNNs even with a random initialization. In this paper, we present another interesting and useful property of RLDNNs that we can learn RL-DNNs with a very large batch size in stochastic gradient descent (SGD). Therefore, the SGD learning can be easily parallelized amon...

متن کامل

Removing Noise in On-Line Search using Adaptive Batch Sizes

Stochastic (on-line) learning can be faster than batch learning. However, at late times, the learning rate must be annealed to remove the noise present in the stochastic weight updates. In this annealing phase, the convergence rate (in mean square) is at best proportional to l/T where T is the number of input presentations. An alternative is to increase the batch size to remove the noise. In th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1610.05792  شماره 

صفحات  -

تاریخ انتشار 2016